Robust speech recognition using the modulation spectrogram

نویسندگان

  • Brian Kingsbury
  • Nelson Morgan
  • Steven Greenberg
چکیده

The performance of present-day automatic speech recognition (ASR) systems is seriously compromised by levels of acoustic interference (such as additive noise and room reverberation) representative of real-world speaking conditions. Studies on the perception of speech by human listeners suggest that recognizer robustness might be improved by focusing on temporal structure in the speech signal that appears as low-frequency (below 16 Hz) amplitude modulations in subband channels following critical-band frequency analysis. A speech representation that emphasizes this temporal structure, the ``modulation spectrogram'', has been developed. Visual displays of speech produced with the modulation spectrogram are relatively stable in the presence of high levels of background noise and reverberation. Using the modulation spectrogram as a front end for ASR provides a signi®cant improvement in performance on highly reverberant speech. When the modulation spectrogram is used in combination with log-RASTA-PLP (log RelAtive SpecTrAl Perceptual Linear Predictive analysis) performance over a range of noisy and reverberant conditions is signi®cantly improved, suggesting that the use of multiple representations is another promising method for improving the robustness of ASR systems. Ó 1998 Elsevier Science B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Missing Data Approach for Robust Automatic Speech Recognition in the Presence of Reverberation

We describe a technique for robust recognition of reverberated speech using the ‘missing data’ paradigm. Modulation filtering is used to identify time-frequency regions of the speech signal which are relatively uncontaminated by reverberation and contain strong speech energy; only these ‘reliable’ acoustic features are made directly available to the recogniser. The proposed system is evaluated ...

متن کامل

Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR

The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

A Spectro-Temporal Demodulation Technique for Pitch Estimation

We consider a two-dimensional demodulation framework for spectro-temporal analysis of the speech signal. We construct narrowband (NB) speech spectrograms, and demodulate them using the Riesz transform, which is a two-dimensional extension of the Hilbert transform. The demodulation results in timefrequency envelope (amplitude modulation or AM) and timefrequency carrier (frequency modulation or F...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Speech Communication

دوره 25  شماره 

صفحات  -

تاریخ انتشار 1998